-
Notifications
You must be signed in to change notification settings - Fork 12
integration: use NTS by default #292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9fe9dee
to
ec98623
Compare
Regarding "it: disable tablets for ConsistencyThreeNodeClusterTests ": |
// Currently we use Cassandra 3.11.19, which disallows the use of `replication_factor` with NTS. | ||
std::string strategy = Options::is_scylla() ? "'NetworkTopologyStrategy'" : "'SimpleStrategy'"; | ||
replication_strategy_s << strategy << ", 'replication_factor': "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the error message when trying to use replication_factor
with Cassandra?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Configuration error: Database returned an error: The query is invalid because of some configuration issue, Error message: replication_factor is an option for SimpleStrategy, not NetworkTopologyStrategy"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik using 'replication_factor': n
is the same as specifying this same RF for each DC.
I think in this branch there is only one DC, right? In which case 'dc1': n
should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, but I'm not sure how ccm
works in case of creating a cluster with single dc - i.e. I don't know what's the default name for this single dc (if any). I'll need to check.
What I know for sure is that if there are multiple dcs, ccm calls them dc1
, dc2
, ..., dc_n
.
Assuming this is true (I haven't checked the values you asked for yet), why we can't satisfy repfactor for tablets, but we could do that for vnodes? |
You couldn't, but the DB did not verify that. For example, it was possible to create a keyspace with RF=4 on a 3-node cluster (but there would not be 4 copies of data). |
Makes sense, thanks. I'll investigate the |
Some features are not working with tablets enabled - for example LWTs. Tests using such features should explicitly disable tablets (if running against Scylla that supports tablets).
Almost all schema metadata tests (except for one) do some checks regarding the materialized view metadata. Let's disable the tablets for this suite.
I believe this is cleaner - we know which tests are failing by looking at the filter. This also enables `TableMetadataColumnOrder` test which I implemented a while ago.... I missed it previously.
They are using LWTs statements, which do not work with tablets.
They are using LWTs.
Regarding It looks like this suite explicitly sets the RF to 3. We want to keep RF=3 on the keyspace, decommission some node(s), and expect the errors on higher consistency levels (such as Since Scylla disallows us to decommission the node (which makes sense), we can simply rewrite the test to use Example test case ( // Perform a sanity check against a full healthy cluster (N=3, RF=3)
insert_.set_consistency(CASS_CONSISTENCY_ALL);
select_.set_consistency(CASS_CONSISTENCY_ALL);
session_.execute(insert_);
session_.execute(select_);
// Decommission node two
decommission_node(2);
// Perform a check using consistency `QUORUM` (N=2, RF=3)
insert_.set_consistency(CASS_CONSISTENCY_QUORUM);
select_.set_consistency(CASS_CONSISTENCY_QUORUM);
session_.execute(insert_);
session_.execute(select_);
// Perform a check using consistency `ONE` (N=2, RF=3)
insert_.set_consistency(CASS_CONSISTENCY_ONE);
select_.set_consistency(CASS_CONSISTENCY_ONE);
session_.execute(insert_);
session_.execute(select_);
// Perform a check using consistency `TWO` (N=2, RF=3)
insert_.set_consistency(CASS_CONSISTENCY_TWO);
select_.set_consistency(CASS_CONSISTENCY_TWO);
session_.execute(insert_);
session_.execute(select_);
// Perform a check using consistency `ALL` (should fail N=2, RF=3)
insert_.set_consistency(CASS_CONSISTENCY_ALL);
select_.set_consistency(CASS_CONSISTENCY_ALL);
ASSERT_NE(CASS_OK, session_.execute(insert_, false).error_code());
ASSERT_NE(CASS_OK, session_.execute(select_, false).error_code());
// Perform a check using consistency `THREE` (should fail N=2, RF=3)
insert_.set_consistency(CASS_CONSISTENCY_THREE);
select_.set_consistency(CASS_CONSISTENCY_THREE);
ASSERT_NE(CASS_OK, session_.execute(insert_, false).error_code());
ASSERT_NE(CASS_OK, session_.execute(select_, false).error_code()); We can simply replace |
Makes sense |
ec98623
to
8facc9f
Compare
Rebased on master |
This one is interesting - it looks like `ccm decomission` command fails for Scylla with tablets. I'm including the error message: ``` Subprocess /home/runner/.ccm/cpp-driver_2025-1-2_3-0/node2/bin/scylla nodetool -h 127.0.0.2 -p 10000 decommission exited with non-zero status; exit status: 4; stderr: error executing POST request to http://127.0.0.2:10000/storage_service/decommission with parameters {}: remote replied with status code 500 Internal Server Error: std::runtime_error (Decommission failed. See earlier errors (Rolled back: Failed to drain tablets: std::runtime_error (Unable to find new replica for tablet baf48a70-297b-11f0-ba85-f2994cdb5698:1 on 9394531c-5a8b-46f3-b236-db5163ceaccc:1 when draining {9394531c-5a8b-46f3-b236-db5163ceaccc}. Consider adding new nodes or reducing replication factor. (nodes [0ae42828-9d81-4e67-9b61-a3914bc81d5b, 7734cef8-85b4-45f8-87d1-f89ae4c51f39], replicas [7734cef8-85b4-45f8-87d1-f89ae4c51f39:1, 0ae42828-9d81-4e67-9b61-a3914bc81d5b:1, 9394531c-5a8b-46f3-b236-db5163ceaccc:1]))). Request ID: bb5bfa8e-297b-11f0-a230-53d049d48e29) ``` This makes sense, because the test uses 3 nodes, with RF=3 - the decomission should fail, because otherwise we are not able to satisfy RF. Instead, we can stop the node so it does not cleanly migrate its data to other nodes (simulating its crash).
It uses counters - they are not supported with tablets. Note: This also disables tablets for BatchCounterThreeNodeClusterTests, which use counters as well.
This is basically the same test suite, but with tablets disabled. Counters are not supported with tablets.
Both test cases use LWTs (though one of them is not enabled yet).
8facc9f
to
8c4966c
Compare
v1.1:
|
Hmm...
|
Value |
By default, the tests used SimpleStrategy if there was a single DC. Using SimpleStrategy is discouraged. Apart from that, Scylla seems to disable the tablets by default for SimpleStrategy keyspaces - we want to prevent that. For Cassandra, we still will be using SimpleStrategy.
8c4966c
to
5287fa4
Compare
I've run this single test case 100 times in a loop locally against Scylla 5.4.8 - the error did not reproduce. |
For the tests with single DC clusters, we would always use
SimpleStrategy
. This PR addresses this, and changes it so we useNetworkTopologyStrategy
for Scylla clusters. For Cassandra clusters, we will still useSimpleStrategy
- it's because Cassandra3.11.19
(the version we currently test against) does not allowreplication_factor
option with NTS.For
SimpleStrategy
keyspaces, Scylla disables the tablets by default. When we transition to NTS, some tests start to fail - it is expected - some features are currently not working with tablets. In this PR, we disable the tablets for tests which:ccm
decomission seems to fail for nodes with tablets keyspaces - the exact error is included in corresponding commit's message).Pre-review checklist
[ ] I have implemented Rust unit tests for the features/changes introduced..github/workflows/build.yml
ingtest_filter
..github/workflows/cassandra.yml
ingtest_filter
.